#2 Text as Data
Faculty of Humanities and Social Sciences
University of Lucerne
01 March 2025
Max Gruber / Better Images of AI
World
Cognition
Language
Language shapes the way we think,
anddetermineswhat we can think about.Benjamin Lee Whorf
Thus, quantification and qualitative analysis go well together.
(see also Grimmer and Stewart 2013)
Challenging structure of texts does not imply no structure.
task.txt.txt .csv .tsv ….docx .pdf .html .xml …task_20240229.pdf instead of task_new_final.pdfWhat other terms have been used to describe nature?
What environmental issues are debated the strongest? When?
Are there any differences between languages? Between corpus versions?
| Operator | Description |
|---|---|
+ |
sums multiple expressions to aggregate trends. |
- |
subtracts an expression from another to measure one ngram relative to another. |
/ |
divides the expression by another one for isolating the behavior of an ngram with respect to another. |
* |
multiplies the expression by a number to compare ngrams of very different frequencies. (Enclose the ngram in parentheses so that * isn’t interpreted as a wildcard.) |
Google Ngram Viewer: Evolution of the phrase ‘attention’
Google Ngram Viewer: Evolution of phrases occuring with ‘culture’
Likely both.
Similarly, language may vary across regions and communities.
Read, read, read to complement stats with context!
The Great War → World War I🤓 use better alternative: bookworm HathiTrust
Lazer, David, Alex Pentland, Lada Adamic, Sinan Aral, Albert-László Barabási, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, Tony Jebara, Gary King, Michael Macy, Deb Roy, and Marshall Van Alstyne. 2009. “Computational Social Science.” Science 323(5915):721–23.
(via OLAT)
Graham, Shawn, Ian Milligan, and Scott Weingart. 2015. Exploring Big Historical Data: The Historian’s Macroscope. Open Draft Version. Under contract with Imperial College Press.